risk tolerance
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Mainali, Mallika, Sureshbabu, Harsha, Sen, Anik, Rauch, Christopher B., Reifsnyder, Noah D., Meyer, John, Turner, J. T., Floyd, Michael W., Molineaux, Matthew, Weber, Rosina O.
As algorithmic decision-makers are increasingly applied to high-stakes domains, AI alignment research has evolved from a focus on universal value alignment to context-specific approaches that account for decision-maker attributes. Prior work on Decision-Maker Alignment (DMA) has explored two primary strategies: (1) classical AI methods integrating case-based reasoning, Bayesian reasoning, and naturalistic decision-making, and (2) large language model (LLM)-based methods leveraging prompt engineering. While both approaches have shown promise in limited domains such as medical triage, their generalizability to novel contexts remains underexplored. In this work, we implement a prior classical AI model and develop an LLM-based algorithmic decision-maker evaluated using a large reasoning model (GPT -5) and a non-reasoning model (GPT -4) with weighted self-consistency under a zero-shot prompting framework, as proposed in recent literature. We evaluate both approaches on a health insurance decision-making dataset annotated for three target decision-makers with varying levels of risk tolerance (0.0, 0.5, 1.0). In the experiments reported herein, classical AI and LLM-based models achieved comparable alignment with attribute-based targets, with classical AI exhibiting slightly better alignment for a moderate risk profile.
Market Scoring Rules Act As Opinion Pools For Risk-Averse Agents
Mithun Chakraborty, Sanmay Das
A market scoring rule (MSR) - a popular tool for designing algorithmic prediction markets - is an incentive-compatible mechanism for the aggregation of probabilistic beliefs from myopic risk-neutral agents. In this paper, we add to a growing body of research aimed at understanding the precise manner in which the price process induced by a MSR incorporates private information from agents who deviate from the assumption of risk-neutrality. We first establish that, for a myopic trading agent with a risk-averse utility function, a MSR satisfying mild regularity conditions elicits the agent's risk-neutral probability conditional on the latest market state rather than her true subjective probability. Hence, we show that a MSR under these conditions effectively behaves like a more traditional method of belief aggregation, namely an opinion pool, for agents' true probabilities.
DevLicOps: A Framework for Mitigating Licensing Risks in AI-Generated Code
Sharma, Pratyush Nidhi, Wright, Lauren, Herfurth, Anne, Sokiyna, Munsif, Sharma, Pratyaksh Nidhi, Das, Sethu, Siponen, Mikko
Generative AI coding assistants (ACAs) are widely adopted yet pose serious legal and compliance risks. ACAs can generate code governed by restrictive open-source licenses (e.g., GPL), potentially exposing companies to litigation or forced open-sourcing. Few developers are trained in these risks, and legal standards vary globally, especially with outsourcing. Our article introduces DevLicOps, a practical framework that helps IT leaders manage ACA-related licensing risks through governance, incident response, and informed tradeoffs. As ACA adoption grows and legal frameworks evolve, proactive license compliance is essential for responsible, risk-aware software development in the AI era.
AlphaAgents: Large Language Model based Multi-Agents for Equity Portfolio Constructions
Zhao, Tianjiao, Lyu, Jingrao, Jones, Stokes, Garber, Harrison, Pasquali, Stefano, Mehta, Dhagash
The field of artificial intelligence (AI) agents is evolving rapidly, driven by the capabilities of Large Language Models (LLMs) to autonomously perform and refine tasks with human-like efficiency and adaptability. In this context, multi-agent collaboration has emerged as a promising approach, enabling multiple AI agents to work together to solve complex challenges. This study investigates the application of role-based multi-agent systems to support stock selection in equity research and portfolio management. We present a comprehensive analysis performed by a team of specialized agents and evaluate their stock-picking performance against established benchmarks under varying levels of risk tolerance. Furthermore, we examine the advantages and limitations of employing multi-agent frameworks in equity analysis, offering critical insights into their practical efficacy and implementation challenges.
Risk in Stochastic and Robust Model Predictive Path-Following Control for Vehicular Motion Planning
Tolksdorf, Leon, Tejada, Arturo, van de Wouw, Nathan, Birkner, Christian
Personal use of this material is permitted. Abstract -- In automated driving, risk describes potential harm to passengers of an autonomous vehicle (A V) and other road users. Recent studies suggest that human-like driving behavior emerges from embedding risk in A V motion planning algorithms. Additionally, providing evidence that risk is minimized during the A V operation is essential to vehicle safety certification. However, there has yet to be a consensus on how to define and operationalize risk in motion planning or how to bound or minimize it during operation. In this paper, we define a stochastic risk measure and introduce it as a constraint into both robust and stochastic nonlinear model predictive path-following controllers (RMPC and SMPC respectively). We compare the vehicle's behavior arising from employing SMPC and RMPC with respect to safety and path-following performance. Further, the implementation of an automated driving example is provided, showcasing the effects of different risk tolerances and uncertainty growths in predictions of other road users for both cases. We find that the RMPC is significantly more conservative than the SMPC, while also displaying greater following errors towards references. Further, the RMPCs behavior cannot be considered as human-like. The RMPC generates undesired driving behavior for even moderate uncertainties, which are handled better by the SMPC. Introducing autonomous vehicles (A Vs) into traffic at scale will take a long period during which A Vs and human-controlled vehicles will share the roads.
Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk?
Chawla, Divij, Bhutada, Ashita, Anh, Do Duc, Raghunathan, Abhinav, SP, Vinod, Guo, Cathy, Liew, Dar Win, Gupta, Prannaya, Bhardwaj, Rishabh, Bhardwaj, Rajat, Poria, Soujanya
We assess whether AI systems can credibly evaluate investment risk appetite-a task that must be thoroughly validated before automation. Our analysis was conducted on proprietary systems (GPT, Claude, Gemini) and open-weight models (LLaMA, DeepSeek, Mistral), using carefully curated user profiles that reflect real users with varying attributes such as country and gender. As a result, the models exhibit significant variance in score distributions when user attributes-such as country or gender-that should not influence risk computation are changed. For example, GPT-4o assigns higher risk scores to Nigerian and Indonesian profiles. While some models align closely with expected scores in the Low- and Mid-risk ranges, none maintain consistent scores across regions and demographics, thereby violating AI and finance regulations.
Mitigating Gambling-Like Risk-Taking Behaviors in Large Language Models: A Behavioral Economics Approach to AI Safety
Large Language Models (LLMs) exhibit systematic risk-taking behaviors analogous to those observed in gambling psychology, including overconfidence bias, loss-chasing tendencies, and probability misjudgment. Drawing from behavioral economics and prospect theory, we identify and formalize these "gambling-like" patterns where models sacrifice accuracy for high-reward outputs, exhibit escalating risk-taking after errors, and systematically miscalibrate uncertainty. We propose the Risk-Aware Response Generation (RARG) framework, incorporating insights from gambling research to address these behavioral biases through risk-calibrated training, loss-aversion mechanisms, and uncertainty-aware decision making. Our approach introduces novel evaluation paradigms based on established gambling psychology experiments, including AI adaptations of the Iowa Gambling Task and probability learning assessments. Experimental results demonstrate measurable reductions in gambling-like behaviors: 18.7\% decrease in overconfidence bias, 24.3\% reduction in loss-chasing tendencies, and improved risk calibration across diverse scenarios. This work establishes the first systematic framework for understanding and mitigating gambling psychology patterns in AI systems.
Exploring Cognitive Attributes in Financial Decision-Making
Mainali, Mallika, Weber, Rosina O.
Second Workshop on Metacognitive Prediction of AI Behavior Exploring Cognitive Attributes in Financial Decision-Making Mallika Mainali, Drexel University, Philadelphia, P A, 19104, USA Rosina O. Weber, Drexel University, Philadelphia, P A, 19104, USA Abstract--Cognitive attributes are fundamental to metacognition, shaping how individuals process information, evaluate choices, and make decisions. T o develop metacognitive artificial intelligence (AI) models that reflect human reasoning, it is essential to account for the attributes that influence reasoning patterns and decision-maker behavior, often leading to different or even conflicting choices. This makes it crucial to incorporate cognitive attributes in designing AI models that align with human decision-making processes, especially in high-stakes domains such as finance, where decisions have significant real-world consequences. However, existing AI alignment research has primarily focused on value alignment, often overlooking the role of individual cognitive attributes that distinguish decision-makers. T o address this issue, this paper (1) analyzes the literature on cognitive attributes, (2) establishes five criteria for defining them, and (3) categorizes 19 domain-specific cognitive attributes relevant to financial decision-making.
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Campos, Simeon, Papadatos, Henry, Roger, Fabien, Touzet, Chloé, Murray, Malcolm, Quarks, Otter
The recent development of powerful AI systems has highlighted the need for robust risk management frameworks in the AI industry. Although companies have begun to implement safety frameworks, current approaches often lack the systematic rigor found in other high-risk industries. This paper presents a comprehensive risk management framework for the development of frontier AI that bridges this gap by integrating established risk management principles with emerging AI-specific practices. The framework consists of four key components: (1) risk identification (through literature review, open-ended red-teaming, and risk modeling), (2) risk analysis and evaluation using quantitative metrics and clearly defined thresholds, (3) risk treatment through mitigation measures such as containment, deployment controls, and assurance processes, and (4) risk governance establishing clear organizational structures and accountability. Drawing from best practices in mature industries such as aviation or nuclear power, while accounting for AI's unique challenges, this framework provides AI developers with actionable guidelines for implementing robust risk management. The paper details how each component should be implemented throughout the life-cycle of the AI system - from planning through deployment - and emphasizes the importance and feasibility of conducting risk management work prior to the final training run to minimize the burden associated with it.
How to Choose a Threshold for an Evaluation Metric for Large Language Models
Sarmah, Bhaskarjit, Li, Mingshu, Lyu, Jingrao, Frank, Sebastian, Castellanos, Nathalia, Pasquali, Stefano, Mehta, Dhagash
To ensure and monitor large language models (LLMs) reliably, various evaluation metrics have been proposed in the literature. However, there is little research on prescribing a methodology to identify a robust threshold on these metrics even though there are many serious implications of an incorrect choice of the thresholds during deployment of the LLMs. Translating the traditional model risk management (MRM) guidelines within regulated industries such as the financial industry, we propose a step-by-step recipe for picking a threshold for a given LLM evaluation metric. We emphasize that such a methodology should start with identifying the risks of the LLM application under consideration and risk tolerance of the stakeholders. We then propose concrete and statistically rigorous procedures to determine a threshold for the given LLM evaluation metric using available ground-truth data. As a concrete example to demonstrate the proposed methodology at work, we employ it on the Faithfulness metric, as implemented in various publicly available libraries, using the publicly available HaluBench dataset. We also lay a foundation for creating systematic approaches to select thresholds, not only for LLMs but for any GenAI applications.